Case Studies: Memory Behavior of Multithreaded Multimedia and AI Applications
نویسندگان
چکیده
Memory performance becomes a dominant factor for today’s microprocessor applications. In this paper, we study memory reference behavior of emerging multimedia and AI applications. We compare memory performance for sequential and multithreaded versions of the applications on multithreaded processors. The methodology we used including workload selection and parallelization, benchmarking and measurement, memory trace collection and verification, and tracedriven memory performance simulations. The results from the case studies show that opposite reference behavior, either constructive or disruptive, could be a result for different programs. Care must be taken to make sure the disruptive memory references will not outweigh the benefit of parallelization.
منابع مشابه
Memory Hierarchy Studies of Multimedia-enhanced Simultaneous Multithreaded Processors for MPEG-2 Video Decompression
This paper explores cache models for a simultaneous multithreaded processor with multimedia enhancements. We start with a wide-issue superscalar processor, enhance it by the simultaneous multithreading (SMT) technique, by multimedia units, and by an additional on-chip RAM storage. Our workload is a multithreaded MPEG-2 video decompression algorithm that extensively uses multimedia units. Variou...
متن کاملExploring the Use of Hyper-Threading Technology for Multimedia Applications with Intel® OpenMP* Compiler
Processors with Hyper-Threading technology can improve the performance of applications by permitting a single processor to process data as if it were two processors by executing instructions from different threads in parallel rather than serially. However, the potential performance improvement can be only obtained if an application is multithreaded by parallelization techniques. This paper pres...
متن کاملMultithreaded Input-Sensitive Profiling
Input-sensitive profiling is a recent performance analysis technique that makes it possible to estimate the empirical cost function of individual routines of a program, helping developers understand how performance scales to larger inputs and pinpoint asymptotic bottlenecks in the code. A current limitation of input-sensitive profilers is that they specifically target sequential computations, i...
متن کاملThe Case for Region Serializability
It is difficult to write correct multithreaded code. This difficulty is compounded by the weak memory model [1] provided to multithreaded applications running on commodity multicore hardware, where there is not an easily understood semantics for applications containing data races. For example, the DRF0 memory model only guarantees sequential consistency to data-race free programs [2], and while...
متن کاملThe Inherent Variability of Multithreaded Commercial Workloads Can Lead to Incorrect Results in Architectural Simulation
Multithreaded, throughput-oriented commercial applications, such as databases and Web servers, represent a dominant class of Internet service workloads. Computer architects increasingly optimize current and future server architectures (such as multithreaded processors and chip multiprocessors) for these important workloads. Architects use multithreaded benchmarks to evaluate alternative designs...
متن کامل